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METHODS, NUCLEOTIDE SEQUENCES AND HOST CELLS FOR 
ASSAYING EXOGENOUS AND ENDOGENOUS PROTEASE ACTIVI TY 

TECHNICAL FIELD OF INVENTION 
5 The invention re;lates to methods for assaying 

exogenous protease activity in a host cell transformed 
with nucleotide sequences encoding that protease and a 
specialized substrate. It also relates to methods for 
assaying endogenous protease activity in a host cell 
10 transformed with nucleotide sequences encoding a 

specialized substrate. When these nucleotide sequences 
are expressed, the exogenous or endogenous protease 
cleaves the substrate and releases a polypeptide that is 
setreted out of the cell, where it can be easily 
15 quantitated using standard assays. The methods and 

transformed host cells of this invention are particularly 
useful for identifying inhibitors of the exogenous and 
endogenous proteases. If the protease is a protease from 
an infectious agent or is characteristic of a diseased 
20 state, inhibitors identified by these methods are 
potential pharmaceutical agents for treatment or 
prevention of the disease. 

BACKGROUND ART 

Proteases play an important role in the 
25 regulation of many biological processes. They also play 
a major role in disease.. In particular, proteolysis of 
primary polypeptide precursors is essential to the 
replication of several infectious viruses, including HIV 
and HCV. These viruses encode proteins that are 
30 initially synthesized as large polyprotein precursors 
Those precursors are ultimately processed by the viral 
protease to mature viral proteins. In light of this, 
researchers have begun to concentrate on inhibition of 
viral proteases as a potential treatment for certain 
35 viral diseases. 

Proteases also play a role in non-infectious 
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diseases. For example, changes in normal cellular 
function may cause an undesirable increase or decrease i 
proteolytic activity. This often leads to a disease 
state . 

The ability to detect viral or mutant protease 
activity in a quick and sirhple assay is important in the 
biochemical characterization of these proteases and in 
the screening and identification of potential inhibitors 
Several of these assays have been described in the art. 

T. M. Block et al . , Antimicrob. Agents 
Chemother., 34, pp. 2337-41 (1990) described a prototype 
assay for screening potential HIV protease inhibitors. 
This assay involved cloning the HIV protease recognition 
sequence into the tetracycline resistance gene (Tet") of 
15 pBR322 and cotransf roming E. coli with the modified Tet"* 
gene and the gene encoding the HIV protease. Co- 
expression of these two genes caused tetracycline 
sensitivity. Potential inhibitors were identified by the 
ability to restore tetracycline resistance to the 
20 transformed bacteria. 

E. Sarubbi et al., FEES Lett . , 279, pp. 265-69 
(1991) described another assay for detecting HIV protease 
inhibitors that utilized a HIV-1 Gag-fi-galactosidase 
fusion protein and a monoclonal antibody that bound to 
25 the fusion protein in the gag region. Coexpression of 

the HIV protease and the fusion protein lead to cleavage 
of the latter and abolished monoclonal antibody binding. 
Potential inhibitors were identified by increased binding 
of the monoclonal antibody to the fusion protein. 

T. A. Smith et al., Proc. Natl. Acad. Sci. USA , 
88, pp. 5159-62 (1991), B. Dasmahapatra et al . , Proc. 
Natl. Ac ad. Sci. USA , 89, pp. 4159-62 (1992) and M. G. 
Murray et al . , Gene , 134, pp. 123-28 (1993) each 
described protease assay systems utilizing the yeast GAL4 
35 protein. Each of these authors described" inserting a 

protease cleavage site in between the DNA binding domain 
and the transcriptional activating domain of GAL4 . 
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Cleavage of that site by a coexpressed protease renders 
GAL4 transcriptionally inactive leading to the inability 
of the transformed yeast to metabolize galactose. 

H.-D. Liebig et al . , Proc. Natl. Acad. Sci . 
USA, 88, pp. 5979-83 (1991), disclosed the use of a fusion 
protein consisting of a self-cleaving protease fused to 
the a fragment of 15-galactosidase to assay protease 
activity. Active forms of the protease cleaved 
themselves off of the fusion protein and the resulting 
protein was able to carry out a-complementation . Fusions 
containing inactive protease were unable to perform a- 
complementation. 

Y. Komoda et al., J. Virol. , 68, pp. 7351-57 
(1994) described an assay to identify HCV protease 
15 cleavage sites within the HCV precursor polyprotein. 
These authors created chimeric proteins comprising 
various portions of the HCV precursor polyprotein 
inserted in between the E. coll maltose binding protein 
and dihydrofolate reductase. If the HCV portion of +-hese 
chimeras contained a cleavage site, the chimera would be 
cleaved. when it was coexpressed with HCV protease in E. 
coli. Cleavage of the chimera was determined by SDS- 
polyacrylamide gel electrophoresis of E. coli lysates. 

Y. Hirowatari et al . , Anal. Biochem. . 225, pp. 
25 113-120 (1995) described another assay to detect HCV 
protease activity. In this assay, the substrate, HCV 
protease and a reporter gene are cotransf ected into COS 
cells. The substrate is a fusion protein consisting of 
(HCV NS2)-(DHFR)-(HCV NS3 cleavage site) -Taxi. The 
30 reporter gene is chloramphenicol transferase (CAT) under 
control of the HTLV-1 long terminal repeat (LTR) and 
resides in the cell nucleus following expression. The 
uncleaved substrate is expressed as a membrane-bound 
protein on the surface of the endoplasmic reticulum due 
35 to the HCV NS2 portion. Upon cleavage, the released Taxi 
protein translocates to the nucleus and activates CAT 
expression by binding to the HTLV-1 LTR. Protease 



20 
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activity is determined by measuring CAT activity in a 
cell lysate. 

Despite these developments, no one has yet 
developed a protease assay system that can be carried out 
5 with higher eukaryotic cells and is both quantitative and 
does not require cell lysis prior to quantitation. 
Avoiding cell lysis prior to quantitation is desirable in 
that the assay may be performed more rapidly and with 
less manipulation. Also, lysis can often lead to 
10 aberrant results. Thus, there is a need for an accurate 
and quantitative cellular-based protease assay that can 
be carried out in a higher eukaryotic cell without cell 
lysis . 

SUMMARY OF THE INVENTION 

15 The present invention fulfills this need by 

providing methods for assaying exogenous protease 
activity in a host cell expressing that protease. The 
methods involve utilizing a host cell expressing a first 
nucleotide sequence encoding an exogenous protease and a 

20 second nucleotide sequence encoding an artificial 

substrate for that protease. The artificial substrate 
comprises a cleavage site for the protease situated at or 
near the natural maturation site of a pre-polypeptide, 
part of which is secreted following proteolytic 

25 processing. When the host is grown under conditions that 
cause expression of the first and second nucleotide 
sequences, the exogenous protease cuts the artificial 
substrate at the cleavage site, releasing the mature 
polypeptide which is secreted into the growth media. The 

30 growth media is then isolated and assayed for the mature 
polypeptide . 

Alternatively, the invention may be utilized to 
assay endogenous proteases, especially when quantitation 
of those proteases is difficult due to the inability to 

35 detect or distinguish between the cleaved and uncleaved 
native substrate. 



SUBSTtTUTE SHEET (RULE 23) 
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According to one aspect of the invention, the 
assay is used to quantitate an exogenous viral protease. 
Such assays are particularly useful as replacements for 
current viral protease assays that require the use of 
5 intact, infectious virus or where no simple viral model 
is available to detect viril protease activity. These 
assays may be used to identify and assay potential 
inhibitors of viral proteases which, in turn, may be used 
as pharmaceutical agents for the treatment or prevention 
10 of viral disease. 

This invention also provides host cells 
transformed with nucleotide sequences encoding an 
endogenous protease and a corresponding substrate, as 
well as those transformed with a specialized substrate 
15 for an endogenous protease. These hosts may be used in 
the methods of this invention. 

BRIEF DESCRIPTION OF THE DRAWINGS 

TTigure 1 depicts the structure of pcDL-SRa296. 
Figure 2 depicts the structure of a derivative 
20 of pKV containing the pre-IL-lU coding sequence. 

Figure 3, panel A, is an immunoblot of cell 
lysates from cells transfected with a NS3-wild-type or 
NS3-mutant NS3-4A-4B-ILlii or cotransf ected with a NS3- 
mutant NS3-4A-4B-ILlfl and a NS3 (1-180) construct probed 
25 with an anti-NS3 antibody. Figi^re 3, panel B, is an 

immunoblot of the same cell lysates probed with an anti- 
IL-IB antibody. 

Figure 4 depicts the immunoprecipitation of the 
media from ^^S-labelled cells transfected with either a 
30 NS3-wild-type or NS3-mutant NS3-4A-4B-ILli5 construct with 
an anti-IL-li3 antibody. 

Figure 5 is an immunoblot of cell lysates from 
cells co-transf ected with NS3-4A and either a NS5A/5B- or 
CSM-containing pre-ILlI3 substrate probed with an anti-IL- 
35 115 antibody. 

Figure 6 depicts the immunoprecipitation of the 
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media from ^^S-labelled cells co-transf ected with NS3-4A 
and either a NS5A/5B- or CSM-containing pre-ILlii 
substrate with an anti-IL-lB antibody. 

Figure 7 depicts the inhibition of HCV NS3 
5 protease cleavage of pre-IL-li3* by varying concentration 
of VH16075 and VH15924. 

DETAILED DESCRIPTION OF THE INVENTION 

The present invention provides a method for 
assaying exogenous protease activity in a host cell 
10 comprising the steps of: 

(a) incubating a host cell transformed 
with a first nucleotide sequence encoding an exogenous 
protease and a second nucleotide sequence encoding an 
artificial polypeptide substrate under conditions which 
15 cause said exogenous protease and said artificial 
substrate to be expressed; 

wherein said substrate comprises: 

(i) a cleavage site for said 
exogenous protease; and 
20 (ii) a polypeptide that is secreted 

out of said cell following cleavage by said 
exogenous protease; 

(b) separating said host cell from its 
growth media under non-lytic conditions; and 
25 . (c) assaying said growth media for the 

presence of said secreted polypeptide. 

As used herein, the term "exogenous protease" 
means a protease not normally expressed by the host cell 
used in the assay. That term includes full-length 
30 proteases that are identical to those found in nature, as 
well as catalytically active fragments thereof. 

The choice of exogenous protease to be assayed 
is solely dependent upon the decision of the user. The 
only requirements are that: (1) the specificity of the 
35 enzyme in terms of what amino acid residues or sequences 
it cleaves at be known; (2) the primary structure of at 
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least the catalytically active portion of the enzyme be 
known; and (3) a nucleotide sequence encoding at least an 
enzymatically active portion of the protease exists or 
can be made and can be expressed in a heterologous host 
5 cell. 

According to a preferred embodiment, the 
exogenous protease is a protease encoded by a pathogenic 
agent. More preferred is a protease encoded by a 
pathogenic virus. Most preferably, the exogenous 
10 protease is the NS3 protease of hepatitis C virus 
("HCV") , 

HCV NS3 protease is a 70 kilodalton protein 
that is involved in the maturation of viral polypeptides 
following infection. It is a serine protease which has a 

15 Cys-X or Thr-X substrate specificity. It has also been 
shown that the protease activity of NS3 resides 
exclusively in the N-terminal 180 amino acids of the 
enzyme. Therefore, nucleotide sequences encoding 
anywhere from the first 180 amino acids of NS3 up to the 

20 full length enzyme may be utilized in the methods of this 
invention. Active fragments of other known proteases may 
also be used as an alternative to the full-length 
protease . 

According to an alternative embodiment, the 
25 invention provides a method for assaying endogenous 

protease activity in a host cell comprising the steps of: 

a) incubating a host cell transformed with a 
nucleotide sequence encoding an artificial polypeptide 
substrate under conditions which cause said artificial 

30 substrate to be expressed; 

wherein said substrate comprises: 

i) a cleavage site for said endogenous 
protease; and 

ii) a polypeptide that is secreted out of 
35 said cell following cleavage by said endogenous 

protease; 

b) separating said host cell from its growth 
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media under non-lytic conditions; and 

c) assaying said growth media for the 
presence of said secreted polypeptide. 

The term "endogenous protease", as used 
5 throughout this application, refers to a proteases that 
is normally expressed by the host cell. It includes both 
wild type proteases, as well as naturally occurring 
mutant proteases with increased or decreased activity. 
According to the invention, th^z. artificial 
10 polypeptide substrate used in the methods must comprise a 
cleavage site for the protease to be assayed; and must be 
secreted out of the cell following cleavage by that 
protease. Preferably, the DNA encoding the artificial 
substrate is derived from a gene or cDNA encoding a 
15 naturally occurring polypeptide that is normally cleaved 
and then secreted out of a cell, but not necessarily 
cleaved by the cell utilized in the assay. 

The DNA encoding that polypeptide is then 
modified by inserting, in frame with the polypeptide 
20 coding sequence, nucleotides encoding a cleavage site 
that is recognized by the exogenous protease to be 
tested. If the cell utilized in the assay is capable of 
cleaving the substrate at its native cleavage site, then 
the nucleotides encoding the polypeptide's native 
25 cleavage site must be altered so as to render it 
uncleavable by endogenous proteases. 

The protease cleavage site in the artificial 
substrate is preferably inserted within 60 amino acids on 
either side of the native cleavage site. Preferably, the 
30 artificial cleavage site is inserted N-terminal to the 
native cleavage site. Alternatively, the protease 
cleavage site can be created by mutating the native 
polypeptide sequence. Such mutation is preferably 
performed on a sequence within 60 amino acids, more 
35 preferably N-terminal to the native cleavage site and 

within 8-10 amino acids of the native cleavage site; or 
is a mutation of the native cleavage site itself- 
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Alteration of the native cleavage site to 
render it uncleavable by the host cell may be achieved, 
if necessary, by insertion, deletion or mutation of 
nucleotides at that site. 
5 Insertion of the -protease cleavage site into 

the substrate and alteration of its native cleavage site 
may be accomplished by any combination of a number of 
recombinant DNA techniques well known in the art, such as 
site directed mutagenesis or standard restriction 
10 digest/ligation cloning techniques. Alternatively, the 
DNA encoding all or part of the artificial substrate may 
be produced synthetically using a commercially available 
automated oligonucleotide synthesizer. Regardless of the 
techniques used to insert the protease cleavage site into 
15 the substrate polypeptide or alter its native cleavage 
site, it is crucial that the reading frame of the 
substrate polypeptide remain intact, without the 
insertion of stop codons . 

The choice of secretable polypeptide from which 
the artificial substrate is derived may be selected from 
any pre-polypeptide that can be cleaved by and the 
resulting mature polypeptide secreted out of the host 
cell used for the assay, but is not normally present in 
that cell. For use in eukaryotic cells there are two 
25 main categories of pre-polypeptide from which the choice 
. can be made. 

The first and preferred category comprises pre- 
polypeptides that are expressed and cleaved in the 
cytoplasmic compartment. Among these proteins are 
30 interleukin-ia (IL-lfi) , interleukin-la (IL-la), basic 

fibroblast growth factor (bFGF) and endothelial-monocyte 
activating polypeptide II (EMAP-II) . The advantage of 
using cytoplasmic pre-polypeptides is that there is a 
much greater likelihood that the protease and the 
35 artificial substrate will share the same subcellular 

compartment. This is because most proteases of interest 
are also cytoplasmic proteins and thus will have access 

SUBSTITUTE SHEET (RULE 23) 
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to the artificial substrate. 

The second category of pre-polypeprides that 
may be used to create artificial substrates used in the 
methods of this invention are those that are expressed on 
the cell surface through the organellar secretory pathway 
and are retained on the cell surface. Such substrates 
are useful to assay endogenous and exogenous cell 
membrane proteases, as well as exogenous proteases that 
are similarly engineered to be cell membrane proteins. 
The technique of creating a cell membrane protease or 
substrate involves cloning a leader peptide (i.e., signal 
sequence) onto the N-terminus of the substrate or 
protease and a hydrophobic, membrane anchor sequence 
(either a transmembrane domain or a glycosylphophatidyl- 
15 inositol anchor sequence) onto the C- terminus. The 

resulting substrate is a cell membrane protein with an 
extracellularly located cleavage site. When cleaved by a 
cell membrane protease on the same or a neighboring cell, 
the secreted polypeptide portion of the substrate is 
20 released into the media. 

Examples of sequences that may be used for 
anchoring these proteins in the membrane are the 
transmembrane domains of TNFa precursor [Nedopsasov et 
^1-/ Cold Spring Harb. Symp. Quant. Biol. . 51, pp. 611-24 
(1986)], SP-C precursor [Keller et al . , Biochem J. , 277, 
pp. 493-99 (1991)], or alJcaline phosphatase [Berger et 
^1-/ Proc. Natl. Acad. Sci. USA , 86, pp. 1457-60 (1989)]. 

Techniques for cloning a signal sequence onto a 
cytoplasmic protein have been well documented [see, for 
example, Kizer and Trosha, BBRC , 174, pp. 586-92 (1991); 
Jost et al., J. Biol. Chem. , 269, pp. 26267-72 (1994) 
(expression and secretion of functional single chain Fv 
molecules using immunoglobulin light chain leader 
sequence); and Sasada et al.. Cell Structure Function . 
35 13, pp. 129-41 (1988) (secretion of hioman EGF and IgE in 
mammalian cells using an IL-2 leader sequence)], as have 
techniques for cloning a transmembrane anchor sequences 
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onto cytoplasmic proteins [Berger et al . , supra ; Qda et 
al., Biochem J., 301, pp. 577-83 (1984)]. By combining 
these two techniques, the protease or substrate of 
interest can be converted from a cytoplasmic protein into 
a cell surface membrane protein. 

In order to insure that the substrate and 
protease will have access to one another and according to 
an alternate embodiment of the invention, the artificial 
substrate and an exogenous protease to be assayed may be 
encoded as part of a single polyprotein. That 
polyprotein may be a cytoplasmic or a membrane protein, 
as long as the substrate and protease domains reside in 
the same cellular compartment. 

The choice of host cell to use in this method 
15 is virtually unlimited. Any cell that can grow in 

culture, be transformed or transfected with heterologous 
nucleotide sequences and can express those sequence may 
be employed in this method. These include bacteria, such 
as E. coli. Bacillus , yeast and other fungi, plant t«11s, 
20 insect cells, mammalian cells. In addition, expression 
of either of those sequences in higher eukaryotic host 
cells may be transient or stable. Preferably, the host 
cell is a higher eukaryotic cell that is incapable of 
cleaving the substrate at its native cleavage site. 
25 Preferably, the host cell is a mammalian cell. Most 
preferably, the host cell is a COS cell. 

It will be apparent that the specific choice of 
cell is governed by the particular protease to be assayed 
and by the particular artificial substrate used. in 
30 embodiments that assay an exogenous protease, one obvious 
limitation is that the endogenous cellular enzymes of the 
chosen host must be unable to cleave the artificial 
substrate to any significant extent. The endogenous rate 
of artificial substrate cleavage may be determined by 
35 transforming the selected host cell with only the 

nucleotide sequence coding for the artificial substrate 
and then growing that host under conditions which cause 
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expression of that nucleotide sequence and which would 
cause expression of the exogenous protease-encoding 
nucleotide sequence if that sequence were present. The 
growth media of the cell is then assayed for the presence 
5 of the secreted polypeptide . portion of the substrate. In 
assays that measure exogenous protease activity, control 
cells (no exogenous protease expressed) should secrete 
less than 10% of the total amount of expressed substrate 
(due to endogenous cleavage and, in assays that do not 

10 distinguish between cleaved and uncleaved substrates, 

leeching of uncleaved substrate out of the cell) in order 
to be useful in the methods of this invention. When an 
endogenous protease is assayed, a controls for non- 
specific substrate cleavage is a cell transformed with a 

15 substrate that contain a mutation at the cleavage site. 
This mutation renders the substrate uncleavable by the 
specific endogenous protease being assayed, but still 
susceptible to non-specific cleavage. As with assays for 
exogenous proteases, control cells should secrete less 

20 than 10% of the total amount of expressed substrate- 

In order to quantitate the protease activity, 
the amount of secreted substrate polypeptide is measured. 
Quantitation may be achieved by subjecting the growth 
media to any of the various standard assay procedures 

25 that are well known in the art. These include, but are 
not limited to, immunoblotting, ELISA, 
iromunoprecipitation, RIA, other colorimetric assays, 
enzymatic assay or bioassay. Quantitation techniques 
that employ antibodies, preferably utilize antibodies 

30 that have low cross-reactivity with the uncleaved 

substrate- Preferably cross-reactivity is less than 20% 
and more preferably less than 5%, 

According to another embodiment, the present 
invention provides a method of screening for protease 

35 inhibitors. In this method, the above-described assay is 
carried out in the presence and absence of potential 
inhibitors of the protease. When the assays of this 
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invention are performed using cells which transiently 
express the substrate and protease, the inhibitor is 
preferably added immediately after transfection with the 
protease and substrate-encoding DNA sequences. When 
stable transformants are used, the potential inhibitor is 
added at the beginning of. ■'.the assay. The efficacy of the 
potential inhibitor (and its ability to cross the cell 
membrane) is determined by comparing the amount of 
secreted substrate polypeptide present in the media of 
cells assayed in its presence versus its absence. 
Compounds which cause at least a 90% reduction in the 
amount of secreted substrate polypeptide are potentially 
useful protease inhibitors. 

In order that the invention described herein 
may be more fully understood, the following examples are 
set forth. It should be understood that these examples 
are for illustrative purposes only and are not to be 
construed as limiting this invention in any manner. 

EXAMPLE 1 
Construction Of Expression Plasmids 
A. HCV NS3 Protease 

We cloned the nucleotide sequence coding for 
the entire, intact HCV NS3 protease, an NS3-4A 
polyprotein or a truncated NS3 consisting of amino acids 
1 to 180 into the mammalian expression plasmid pcDL-SRa 
[Y. Takebe et al., Mol. Cell. Biol. . 8, pp. 466-72- 
(1988)]. That plasmid contains an SV40 origin of 
replication and an HTLV LTR enhancer/promoter sequence 
which ultimately drives the high level expression of the 
NS3 coding sequences (Figure 1) . 

The respective NS-3 coding fragments (full 
length NS3, NS3-4A polyprotein or truncated NS3 (amino 
acids 1-181) were obtained by PCR of the corresponding 
portions of a full length HCV H strain cDNA (SEQ ID 
35 N0:1). For each of the three coding fragments the 
following 5' primer was used (SEQ ID N0:2): 

5 ' GGACTAGTCTGCAGTCTAGAGCTCCATGGCGCCCATCACGGCGTACG3 ' . The 
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f ragraent-specif ic 3' primers used were: 
NS3 - (SEQ ID NO: 3) : 

3 *GAAGATCTGAATTCTAGATTTTACGTGACGACCTCCACGTCGGC5' ; 
NS3-4A - {SEQ ID N0:4): 
5 3 • GAAGATCTGAATTCTAGATTTTAGCACTCTTCCATCTCATCGAA5 ' ; and 
NS3 (1-181) - (SEQ ID NO: 5):.'. 

3 ' GAAGATCTGAATTCTAGATTTTAGGATCTCATGGTTGTCTCTAGG5 ' . These 
primers produced PGR- amplified fragments containing 
multiple restriction sites at either end for ease of 
10 cloning. 

In order to ligate the fragments to the vector, 
the vector was first cleaved with PstI and EcoRI to 
remove a small fragment. The cut vector was then 
purified and ligated to the respective Pstl/EcoRI cut NS3 

15 protease-encoding fragment, 
B, IL-115/NS3 Substrate 

A derivative of plasmid pKV containing the pre- 
IL-lli coding sequence has been described by P. K. Wilson 
et al.. Nature , 370, pp. 253-70 (1994). That plasmid 

20 contains the SV40 origin of replication and the early 

promoter. The pre-IL-lB sequence was cloned between the 
Spel and Bglll sites shown in Figure 2. 

We inserted a double stranded synthetic DNA 
fragment (SEQ ID NO: 6) which encoded 20 amino acids: SEQ 

25 ID NO: 7: GADTEDWCCSMSYTWTGVH and contained linkers at 
both ends that included an ApaLl restriction site. The 
DNA was cloned into the ApaLl site in pre-IL-115 (between 
the codons for amino acids Hisns and Aspug) , immediately 
upstream of the native cleavage site (located between 

30 Aspue and Ala^^v) , The first 18 amino acids of the insert 
correspond to the HCV peptide 5A/5B cleavage site. The 
last two amino acids are encoded by the linker. The 
inserted DNA maintained the reading frame of the native 
pre-IL-16 protein. The resulting substrate is referred 

35 to throughout the application as "pre-IL-lfi* " . 

NS3 cleaves the inserted peptide in between the 
cysteine and serine residues. Because the COS cells we 
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utilized in this assay were incapable of cleaving pre-IL- 
IB (data not shown) , we did not have to knock out the 
native pre-IL-113 cleavage site. 

In another construct, we performed site 
5 directed mutagenesis to alter the native pre-IL-lB 
cleavage site of Asp^jg-Ala/j^.-PrOiig to Cys-Ser-Met, a 
conserved recognition sequence for NS3. This construct 
is referred to throughout the application as "pre-IL- 
liiB (CSM) " . 
10 C. NS3-4A-A4B-IL-li5 

In order to create a single fusion polypeptide 
that encoded both the exogenous protease and the 
polypeptide substrate, we utilized the fact that NS3 can 
autoprocess (cleave) an NS3-4A-4B polyprotein at both the 
15 NS3-4a and 4A-4B junctions. 

We isolated a DNA fragment that encoded NS3-4A 
and the first 60 amino acids of 4B through PGR using the 
HCV strain H cDNA referred to above (SEQ ID N0:1) and the 
following primers: SEQ ID NO: 8: 
2 0 5 ' GGACTAGTCTGCAGTCTAGAGCTCCATGGCGCCCATCACGGCGTACG3 ' and 
SEQ ID NO: 9: 3 ' GGACGCGGTCTGCAGGAGGCCGAGGGCS ' . The PGR 
products were digested with PstI and Xbal prior to 
cloning. 

The mature IL-lJi portion of the construct 
25 (amino acids 117-269 of SEQ ID NO: 11) was created by PGR 
cloning of full length pre-IL-lJ3 cDNA (SEQ ID NO: 10) 
using the following primers: 

SEQ ID NO: 12: 5 ' CTGGGGGTGCTGCAGGCACGTGTACGATCACTGAAC3 ' ; 
and. SEQ ID NO: 13: 3 ' GGGAATTCTAGATTTTAGGAAGACACAAATTG5 ' . 
These PGR products were digested with PstI and EcoRI 
prior to cloning. 

. The NS3-4A-A4B and IL-lli fragments were then 
ligated together with Xbal/EcoRI digested pcDL-SRa to 
obtain the desired construct. 
35 As a control we created a mutant NS3 protease 

fusion protein construct. This construct was identical 
to the one described above, except that the NS3 portion 
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was created by PGR using the same primers and the cDNA of 
the NS3 active site mutant S1165A [A. Grakoui et al , , j. 
Virod^, 67, pp. 2832-43 (1993)]. The NS3 active site 
mutant contains a serine-to-alanine mutation in its 
active site, rendering the enzyme inactive. 

EXAMPLE 2 

Transfection Of COS Cells And Assay Of Secreted IL-lfi 
The expression plasmid constructs described in 
Example 1 were transfected into COS-7 cells using the 
DEAE-Dextran transfection protocol [Gu et al . , Neuron, 5, 
pp. 147-57 (1990)]. COS cells in 6-well clusters or 100 
mm dishes at 50% confluency were transfected with 4-10 pg 
of the desired plasmid in a DEAE-Dextran solution. 
Following transfection, the cells were incubated an 
15 additional 48 hours before assaying. 

The processing of pre-IL-l/i or NS3-4A-A4B-IL-li3 
fusion protein and subsequent secretion of mature IL-lfl 
into the media was measured by ELISA of IL-lii using an 
antibody that was specific for mature IL-113 (approx. 3% 
cross-reactivity with pre-IL-113) . We analyzed expression 
by harvesting the COS cells in ice-cold phosphate 
buffered saline, lysing the cells in a 0.1% Triton X-100 
buffer and centrifuging the lysate to remove cell debris. 
The lysates were then analyzed by SDS-PAGE and 
25 immunoblotting using an IL-li3 antibody (Genzyme) and an 
NS3 antibody. Alternatively, expression, processing and 
secretion was analyzed by labelling the cells for 24 
hours in the presence of [^^S] -methionine, incubating the 
cells for an additional 24 hours after the label was 
removed and then utilizing immunoprecipi tat ion and SDS- 
PAGE to analyze the polypeptides. 

EXAMPLE 3 

NS3-Specific Processing Of An NS3-4A.-A4B-IL-1/5 Fusion 
Protein And Secretion Of A4B-IL-1B Into The Media 

35 Transfectants expressing the NS3-4A-A4B-IL-1I5 
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fusion protein autoprocessed that protein at both the 
NS3-4A and 4A-4B junctions. The cell lysates of these 
transf ectants were subjected to Western blotting 
utilizing an anti-NS3 antibody. Figure 3, panel A, Wt-1 
5 and Wt-2 lanes, shows that- this experiment produced a 

doublet band in the 70 kD- area, present only as a single 
band in the untransf ormed control cells (panel A, No DNA 
lane) . The second band of the doublet in the Wt-1 and 
Wt-2 lanes corresponds to the size of mature NS3. A 

10 transfectant that expressed an inactive mutant NS3- 

containing NS3-4A-A4B-IL-li3 fusion protein demonstrated 
no 70 kDa doublet and therefore was not autoprocessed 
{NS3 mutant lane) , A transfectant that co-expressed the 
same mutant fusion protein together with a truncated, but 

15 active NS3 — NS3 (1-180) — was also analyzed. 

Surprisingly, the mutant fusion protein did not appear to 
be cleaved by NS3 (1-180), as indicated by the lack of a 
doublet in the 70 kDa region (NS3 mutant + NS3 (1-180) 
lane) , However, a 20 kDa band representing the truncated 

20 NS3 was detected in that lysate, as indicated by the 
NS3 (1-180) arrow. 

A similar experiment performed on cell lysates 
utilizing an mature IL-lfi-specif ic antibody demonstrated 
the presence of a band corresponding in size to the A4B- 

25 IL-li5 portion of the fusion protein in both the NS3-4A- 

A4B-IL-115 transf ectants (Figure 3, panel B, Wt-1 and Wt-2 
lanes) and, to a lesser degree in the NS3 mutant fusion 
protein/NS3 (1-180) cotransf ectant , Virtually no IL-lii 
was detected in the NS3 mutant fusion protein expressing 

30 transfectant (IL-IB arrow) . These experiments confirm 
that the cleavage observed in the wild type NS3-4A-A4B- 
IL-1J5 transf ectants was dependent upon NS3 protease 
activity. Thus, we had proof that cleavage of this 
fusion protein was essentially NS3-dependent and not 

35 caused by some endogenous protease. 

Secretion of the cleaved substrate was 
determined by assaying culture media with a commercially 
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available mature IL-113-speci f ic ELISA assay (R&D Systems, 
Minneapolis, hW) , For the wild-type NS3-containing 
construct we detected a concentration of 2,5 ug/ml of IL- 
lii in the medium. We detected less than 0.25 pg/ml of 
IL-15 in the media of cells transfected with the mutant 
NS3-containing construct. ;\ Immunoprecipitation experiment 
utilizing the same anti-IL-lB antibody demonstrated the 
presence of A4B-IL-1/5 in the media of cells containing 
the wild type NS3-containing construct, but none from the 
mutant NS3-containing construct (Figure 4), thus 
confirming these results. 

EXAMPLE A 

NS3-Specific Processing Of Mutated Pre-IL-115 
Coi^taining An Artificial Cleavage Site And 
15 Secretion Of IL-lfi Into The Media 

We confirmed that NS3 protease can cleave 
artificial substrates other than an HCV polypeptide by 
cotransfecting COS cells with the NS3-4A and either of 
the pre-IL-li3-containing artificial substrate expression 
constructs described in Example IC. 

Co-expression of the NS3-4A and pre-IL-1/3* 
substrate sequences resulted in rapid cleavage of the 
substrate and concomitant secretion of a 19 Kd IL-15 into 
the media. Secretion was quantitated using an ELISA 
25 specific for the processed form of IL-liJ. An immunoblot 
of cell lysates from these transf ormants demonstrated the 
presence of both cleaved and uncleaved substrate (Figure 
5, NS3-4A + IL-li3* lane) . The same experiment was 
performed using cells that were metabolically labelled 
30 with [^^S] -methionine, followed by imm.unoprecipi tation of 
the media with the processed IL-lIi-specif ic antibody, 
the results of the immunoprecipitation experiment are 
shown in Figure 6, NS3-4A + pre-IL-115* lanes. 

When we coexpressed NS3-4A and the pre-IL- 
35 1B{CSM) sequences, we also observed cleavage of the 

substrate at the predicted Cys^.^-Ser,,, site. Both cleaved 
and uncleaved forms were observed in cell lysates using 
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immunoblotting specific for IL-113 (Figure 5, NS3-4A + IL- 
1I5(CSM) lane). Inmunoprecipitation of the media from 
["S] -methionine labelled cells also demonstrated the 
presence IL-lli-containing cleavage product, but less than 
that observed for the 5A- SB-containing pre-IL-lfl 
substrate (Figure 6, NSS-^'A + pre-IL-lfl (C=;M) lane). 

EXAMPLE 5 
Assay of NS3 Inhibitors 

We tested the potential of compounds VH-15924 
and VH-16075 as HCV NS3 protease inhibitors in our 
assays . 

Transfectants expressing the NS3-4A-A4B-IL-lfi 
were grown in the presence of varying amounts VH-15924. 
Even at concentrations as high as 100 ]M, we detected the 
15 presence of the cleavage product, A4B-IL-1/3, in the 

media. This indicated that VH-15924 was not an effective 
inhibitor of NS3 protease. 

We also assayed the inhibition of cleavage and 
secretion of pre-IL-113* substrate by both VH-15924 and 
20 VH-16075. VH-16075 inhibited cleavage and secretion with 
an IC50 of 4 pM. As in the previous experiment, VH-15924 
did not completely inhibit cleavage/secretion even at 
concentrations of 100 viM (Figure 7) . 

While I have hereinbefore presented a number of 
25 embodiments of this invention, it is apparent that my 
basic construction can be altered to provide other 
embodiments which utilize the methods of this invention. 
Therefore, it will be appreciated that the scope of this 
invention is to be defined by the claims appended hereto 
rather than the specific embodiments which have been 
presented hereinbefore by way of example. 
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SEQUENCE LISTING 



(1) GENERAL, INFORMATION: 

(i) APPLICANT: Su, Michael 

(ii) TITLE OF INVENTION: METHODS AND HOST CELLS FOR ASSAYING 
EXOGENOUS AND ENDOGENOUS PROTEASE ACTIVITY 

(iii) NUMBER OF SEQUENCES: 13 

(XV) CORRESPONDENCE ADDRESS: 

<A) ADDRESSEE: Fish & Neave 

(B) STREET: 1251 Avenue of the Americas 

(C) CITY: New York 

(D) STATE: New York 

(E) COUNTRY: United States of America 

(F) ZIP: 10020 

(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentin Release #1.0, Version #1.30 

(Vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Haley Jr, James F 

(B) REGISTRATION NUMBER: 2 7,7 94 

(C) REFERENCE /DOCKET NUMBER: VPI/95-01 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 212-596-9000 

(B) TELEFAX: 212-596-9090 



(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9401 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(ix) FEATURE: 

(A) N7\ME/KEY: matj)eptide 

(B) LOCATION: 3420.. 5312 

(D) OTHER INFORMATION: /product= "NS3 protease" 

( ix ) FEATURE : 

(A) NAME/KEY: mat_peptide 

(B) LOCATION: 5313.. 5474 
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(D) OTHER INFORMATION: /product^ "NS4A" 

(ix) FEATURE: 

(A) NAME/KEY : mat_peptide 

(B) LOCATION: 5475,. 5552 

(D) OTHER INFORMATION; /product= "truncated NS4B" 
(Xi) SEQUENCE DESCRIPTION: SEQ ID -NO:!: 



ur k^>\vj ^ L> L> 


i G/\l VjGGGGL. 


GACACT C CAC 


CAT AGAT CAC 


TCCCCTGTGA 


GGAACTACTG 


60 




GAAAGCGTCT 


AGCCATGGCG 


TTAGTAT GAG 


TGTCGTGCAG 


CCTCCAGGAC 


120 




GGGAGAGCCA 


T AGT GGT CT G 


CGGAACCGGT 


GAGTACACCG 


GAATTGCCAG 


180 


GACGACCGGG 


TCCTTTCTTG 


GAT AAAC C C G 


CTCAATGCCT 


GGAGATTTGG 


GCGTGCCCCC 


240 


GCAAGACTGC 


TAG C C GAGT A 


GTGTTGGGTC 


GCGAAAGGCC 


TTGTGGTACT 


GCCTGATAGG 


300 


GTGCTTGCGA 


GTGCCCCGGG 


AGGTCTCGTA 


GACCGTGCAC 


CATGAGCACG 


AATCCTAAAC 


360 


C T C AAAGA7\A 


AAC CAAACGT 


7\AC AC CAAC C 


GTCGCCCACA 


GGACGTCGAG 


TTCCCGGGTG 


420 


GC GGT C AGAT 


CGTTGGTGGA 


GTTTACTTGT 


TGCCGCGCAG 


GGGCC CTAGA 


TTGGGTGTGC 


480 


GC G C GAC GAG 


G AAGAC T T C C 


GAGCGGTCGC 


AACCTCGTGG 


TAGACGTCAG 


CCTATCCCCA 


540 


>\GGvJ>\OGI 


GCCCGA.GGGC 


AGGACCTGGG 


CTCAGCCCGG 


GTACCCTTGG 


CCCCTCTATG 


600 




1 IGt-GGGiGG 


GCGGGATGGC 


TCCTGTCTCC 


CCGTGGCTCT 


CGGCCTAGCT 


660 






v-erl>\GGx V^GC 


GC-AAl U TGGG 


T AAG GT CAT C 


GAT AC C CTTA 


720 






/^x IjoGGiM-Oi 


i>\UL.GC i CGI 


CGGCGCCCCT 


CTTGGAGGCG 


780 




p r* T n ^ r* n T 




X 1 O X GGMM.G/\ 


CGGCGTGAAC 


TAT GC7UVC AG 


840 


GGAACCTTCC 


X X X V7V^ X X 


rprn ^»ri /-trp T\ fp j-irp 
X J. X 1 J-\.X \^ X 


X v^k^ X X L« X OvjL' 


CL-Xt3L-l V^Xti^X 


TGCCTGACTG 


900 


TGCCCGCTTC 






X X 


I X>\CL.>\X Gi 


ACL.-AA1 GATT 


960 


GCCCTAATTC 


GAGT ATT GT G 


T AC GAG GC GG 


CCGATGCP AT 






T n o n 
1 U <i U 


TCCCTTGCGT 


TCGCGAGGGT 


AACGCCTCGA 


GGTGTTGGGT 


GGCGGTGACC 


CCCACGGTGG 


1080 


CCACCAGGGA 


CGGCAAACTC 


CCCACAACGC 


AGCTTCGACG 


TCATATCGAT 


CTGCTTGTCG 


1140 


GGAGCGCCAC 


CCTCTGCTCA 


GCCCTCTACG 


TGGGGGACCT 


GTGCGGGTCT 


GTTTTTCTTG 


1200 


TTGGTCAACT 


GTTTACCTTC 


TCTCCCAGGC 


GCCACTGGAC 


GACGCAAAGC 


TGCAATTGTT 


1260 


CTATCTATCC 


CGGCCATATA 


ACGGGTCATC 


GCATGGCATG 


GGATATGATG 


ATGAACTGGT 


1320 


CCCCTACGGC 


AGCGTTGGTG 


GTAGCTCAGC 


TGCTCCGGAT 


CCCACAAGCC 


ATCATGGACA 


1380 


TGATCGCTGG 


TGCTCACTGG 


GGAGTCCTGG 


CGGGCATAGC 


GTATTTCTCC 


ATGGTGGGGA 


1440 


ACTGGGCGAA 


GGTCCTGGTA 


GTGCTGCTGC 


TATTTGCCGG 


CGTCGACGCG 


GAAACCCACG 


1500 


TCACCGGGGG 


AAGTGCCGGC 


CACACCACGG 


CTGGGCTTGT 


TGGTCTCCTT 


ACACCAGGCG 


1560 


CCAAGCAGAA 


CATCCAACTG 


ATCAACACCA 


ACGGCAGTTG 


GCACATCJ^AT 


AGCACGGCCT 


1620 
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TGAACTGCAA CGATAGCCTT ACCACCGGCT GGTTAGCAGG GCTCTTCTAT CGCCACAAAT 168 0 

TCAACTCTTC AGGCTGTCCT GAGAGGTTGG CCAGCTGCCG ACGCCTTACC GATTTTGCCC 174 0 

AGGGCTGGGG TCCCATCAGT TATGCCAACG GAAGCGGCCT TGACGAACGC CCCTACTGTT 1800 

GGCACTACCC TCCAAGACCT TGTGGCATTG TGCCCGCAAA GAGCGTGTGT GGCCCGGTAT 18 60 

ATTGCTTCAC TCCCAGCCCC GTGGTGGTGG GAACGACCGA CAGGTCGGGC GCGCCTACCT 192 0 

ACAGCTGGGG TGCAAATGAT ACGGATGTCT TCGTCCTTAA CAACACCAGG CCACCGCTGG 198 0 

GCAATTGGTT CGGTTGTACC TGGATGAACT CAACTGGATT CACCAAAGTG TGCGGAGCGC 2 04 0 

CCCCTTGTGT CATCGGAGGG GTGGGCAACA ACACCTTGCT CTGCCCCACT GATTGCTTCC 2100 

GCAAACATCC GGAAGCCACA TACTCTCGGT GCGGCTCCGG TCCCTGGATT ACACCCAGGT 2160 

GCATGGTCGA CTACCCGTAT AGGCTTTGGC ACTATCCTTG TACTATCAAT TACACCATAT 22 2 0 

TCAAAGTCAG GATGTACGTG GGAGGGGTCG AGCACAGGCT GGAAGCGGCC TGCAACTGGA 22 8 0 

CGCGGGGCGA ACGCTGTGAT CTGGAAGACA GGGACAGGTC CGAGCTCAGC CCATTGCTGC 23 4 0 

TGTCCACCAC ACAGTGGCAG GTCCTTCCGT GTTCTTTCAC GACCCTGCCA GCCTTGTCCA 2400 

CCGGCCTCAT CCACCTCCAC CAGAACATTG TGGACGTGCA GTACTTGTAC GGGGTGGGGT 24 60 

CAAGCATCGC GTCCTGGGCC ATTAAGTGGG AGTACGTCGT TCTCCTGTTC CTTCTGCTTG 252 0 

CAGACGCGCG CGTCTGCTCC TGCTTGTGGA TGATGTTACT CATATCCCAA GCGGAGGCGG 25 8 0 

CTTTGGAGAA CCTCGTAATA CTCAATGCAG CATCCCTGGC CGGGACGCAC GGTCTTGTGT 2 64 0 

CCTTCCTCGT GTTCTTCTGC TTTGCGTGGT ATCTG7\AGGG TAGGTGGGTG CCCGGAGCGG 2700 

TCTACGCCTT CTACGGGATG TGGCCTCTCC TCCTGCTCCT GCTGGCGTTG CCTCAGCGGG 27 60 

CATACGCACT GGACACGGAG GTGGCCGCGT CGTGTGGCGG CGTTGTTCTT GTCGGGTTAA 28 2 0 

TGGCGCTGAC TCTGTCACCA TATTACAAGC GCTATATCAG CTGGTGCATG TGGTGGCTTC 28 80 

AGTATTTTCT GACCAGAGTA GAAGCGCAAC TGCACGTGTG GGTTCCCCCC CTCAACGTCC 2 940 

GGGGGGGGCG CGATGCCGTC ATCTTACTCA TGTGTGTTGT ACACCCGACT CTGGTATTTG 3000 

ACATCACCAA ACTACTCCTG GCCATCTTCG GACCCCTTTG GATTCTTCAA GCCAGTTTGC 3 060 

TTAAAGTCCC CTACTTCGTG CGCGTTCAAG GCCTTCTCCG GATCTGCGCG CTAGCGCGGA 3120 

AGATAGCCGG AGGTCATTAC GTGCAAATGG CCATCATCAA GTTGGGGGCG CTTACTGGCA 318 0 

CCTATGTGTA TAACCATCTC ACCCCTCTTC GAGACTGGGC GCACAACGGC CTGCGAGATC 32 40 

TGGCCGTGGC TGTGGAACCA GTCGTCTTCT CCCGAATGGA GACCAAGCTC ATCACGTGGG 33 00 

GGGCAGATAC CGCCGCGTGC GGTGACATCA TCAACGGCTT GCCCGTCTCT GCCCGTAGGG 33 60 

GCCAGGAGAT ACTGCTTGGA CCAGCCGACG GAATGGTCTC CAAGGGGTGG AGGTTGCTGG 3420 

CGCCCATCAC GGCGTACGCC CAGCAGACGA GAGGCCTCCT AGGGTGTATA ATCACCAGCC 3480 

TGACTGGCCG GGACAAAAAC CAAGTGGAGG GTGAGGTCCA GATCGTGTCA ACTGCTACCC 354 0 
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AAACCTTCCT 


' GGCAACGTGC 


: ATCAATGGGG 


■ TATGCTGGAC 


: TGTCTACCAC 


: GGGGCCGGAA 


3600 


CGAGGACCAT 


' CGCATCACCC 


AAGGGTCCTG 


TCATCCAGAT 


GTATACCAAT 


GTGGACCAAG 


3660 


ACCTTGTGGG 


CTGGCCCGCT 


CCTCAAGGTT 


CCCGCTCATT 


GACACCCTGC 


ACCTGCGGCT 


3720 


CCTCGGACCT 


TTACCTGGTT 


ACGAGGCACG 


CCGACGTCAT 


TCCCGTGCGC 


CGGCGAGGTG 


3780 


ATAGCAGGGG 


TAGCCTGCTT 


TCGCCCCGGC 


CCATTTCCTA 


CCTAAAAGGC 


TCCTCGGGGG 


3840 


GTCCGCTGTT 


GTGCCCCGCG 


GGACACGCCG 


TGGGCCTATT 


CAGGGCCGCG 


GTGTGCACCC 


3900 


GTGGAGTGAC 


CAAGGCGGTG 


GACTTTATCC 


CTGTGGAGAA 


CCTAGAGACA 


ACCATGAGAT 


3960 


CCCCGGTGTT 


CACGGACAAC 


TCCTCTCCAC 


CAGCAGTGCC 


CCAGAGCTTC 


CAGGTGGCCC 


4020 


ACCTGCATGC 


TCCCACCGGC 


AGTGGTAAGA 


GCACCAAGGT 


CCCGGCTGCG 


TACGCAGCCC 


4080 


AGGGCTACAA 


GGTGTTGGTG 


CTCAACCCCT 


CTGTTGCTGC 


AACGCTGGGC 


TTTGGTGCTT 


4140 


ACATGTCCAA 


GGCCCATGGG 


GTCGATCCTA 


ATATCAGGAC 


CGGGGTGAGA 


ACAATTACCA 


4200 


CTGGCAGCCC 


CATCACGTAC 


TCCACCTACG 


GCAAGTTCCT 


TGCCGACGGC 


GGGTGCTCAG 


4260 


GAGGCGCTTA 


TGACATAATA 


ATTTGTGACG 


AGTGCCACTC 


CACGGATGCC 


ACATCCATCT 


4320 


TGGGCATCGG 


CACTGTCCTT 


GACCT^GCAG 


AGACTGCGGG 


GGCGAGATTG 


GTTGTGCTCG 


4380 


CCACTGCTAC 


CCCTCCGGGC 


TCCGTCACTG 


TGTCCCATCC 


TAACATCGAG 


GAGGTTGCTC 


4440 


TGTCCACCAC 


CGGAGAGATC 


CCTTTCTACG 


GCAAGGCTAT 


CCCCCTCGAG 


GTGATCAAGG 


4500 


GGGGAAGACA 


TCTCATCTTC 


TGTCACTCAA 


AGT^GAAGTG 


CGACGAGCTC 


GCCGCGAAGC 


4560 


TGGTCGCATT 


GGGCATCAAT 


GCCGTGGCCT 


ACTACCGCGG 


ACTTGACGTG 


TCTGTCATCC 


4620 


CGACCAACGG 


CGATGTTGTC 


GTCGTGTCGA 


CCGATGCTCT 


CATGACTGGC 


TTTACCGGCG 


4680 


ACTTCGACTC 


TGTGATAGAC 


TGCAACACGT 


GTGTCACTCA 


GACAGTCGAT 


TTCAGCCTTG 


4740 


ACCCTACCTT. 


TACCATTGAG 


ACAACCACGC 


TCCCCCAGGA 


TGCTGTCTCC 


AGGACTCAGC 


4800 


GCCGGGGCAG 


GACTGGCAGG 


GGGAAGCCAG 


GCATCTACAG 


ATTTGTGGCA 


CCGGGGGAGC 


4860 


GCCCCTCCGG 


CATGTTCGAC 


TCGTCCGTCC 


TCTGTGAGTG 


CTATGACGCG 


GGCTGTGCTT 


4920 


GGTATGAGCT 


CATGCCCGCC 


GAGACTACAG 


TTAGGCTACG 


AGCGTACATG 


AACACCCCGG 


4980 


GGCTTCCCGT 


GTGCCAGGAC 


CATCTTGAAT 


TTTGGGAGGG 


CGTCTTTACG 


GGCCTCACCC 


5040 


ATATAGATGC 


CCACTTTCTA 


TCCCAGACAA 


AGCAGAGTGG 


GGAGAACTTT 


CCTTACCTGG 


5100 


TAGCGTACCA 


AGCCACCGTG 


TGCGCTAGGG 


CTCAAGCCCC 


TCCCCCATCG 


TGGGACCAGA 


5160 


TGTGGAAGTG 


TTTGATCCGC 


CTTAAACCCA 


CCCTCCATGG 


GCCAACACCC 


CTGCTATACA 


5220 


GACTGGGCGC 


TGTTCAGAAT 


GAAGTCACCC 


TGACGCACCC 


AATCACCAAA 


TACATCATGA 


5280 


CATGCATGTC 


GGCCGACCTG 


GAGGTCGTCA 


CGAGCACCTG 


GGTGCTCGTT 


GGCGGCGTCC 


5340 


TGGCTGCTCT 


GGCCGCGTAT 


TGCCTGTCAA 


CAGGCTGCGT 


GGTCATAGTG 


GGCAGGATTG 


5400 


TCTTGTCCGG 


GAAGCCGGCA 


ATTATACCTG 


ACAGGGAGGT 


TCTCTACCAG 


GAGTTCGATG 


5460 
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AGATGGAAGA GTGCTCTCAG CACTTACCGT ACATCGAGCA AGGGATGATG CTCGCTGAGC 552 0 

AGTTCAAGCA GAAGGCCCTC GGCCTCCTGC AGACCGCGTC CCGCCATGCA GAGGTTATCA 558 0 

CCCCTGCTGT CCAGACCAAC TGGCAGAAAC TCGAGGTCTT CTGGGCGAAG CACATGTGGA 5 64 0 

ATTTCATCAG TGGGATACAA TATTTGGCGG GCCTGTCAAC GCTGCCTGGT AACCCCGCCA 57 00 

TTGCTTCATT GATGGCTTTT ACAGCTGCCG TCACCAGCCC ACTAACCACT GGCCAAACCC 57 60 

TCCTCTTCAA CATATTGGGG GGGTGGGTGG CTGCCCAGCT CGCCGCCCCC GGTGCCGCTA 582 0 

CCGCCTTTGT GGGCGCTGGC TTAGCTGGCG CCGCCATCGG CAGCGTTGGA CTGGGGAAGG 58 8 0 

TCCTCGTGGA CATTCTTGCA GGGTATGGCG CGGGCGTGGC GGGAGCTCTT GTAGCATTCA 59 4 0 

AGATCATGAG CGGTGAGGTC CCCTCCACGG AGGACCTGGT CAATCTGCTG CCCGCCATCC 6000 

TCTCGCCTGG AGCCCTTGTA GTCGGTGTGG TCTGCGCAGC AATACTGCGC CGGCACGTTG 6060 

GCCCGGGCGA GGGGGCAGTG CAATGGATGA ACCGGCTAAT AGCCTTCGCC TCCCGGGGGA 6120 

ACCATGTTTC CCCCACGCAC TACGTGCCGG AGAGCGATGC AGCCGCCCGC GTCACTGCCA 618 0 

TACTCAGCAG CCTCACTGTA ACCCAGCTCC TGAGGCGACT ACATCAGTGG ATAAGCTCGG ' 624 0 

AGTGTACCAC TCCATGCTCC GGCTCCTGGC TAAGGGACAT CTGGGACTGG ATATGCGAGG 6300 

TGCTGAGCGA CTTTAAGACC TGGCTGAAAG CCAAGCTCAT GCCACAACTG CCTGGGATTC 6360 

CCTTTGTGTC CTGCCAGCGC GGGTATAGGG GGGTCTGGCG AGGAGACGGC ATTATGCACA 6420 

CTCGCTGCCA CTGTGGAGCT GAGATCACTG GACATGTCAA AAACGGGACG ATGAGGATCG 6480 

TCGGTCCTAG GACCTGCAGG AAGATGTGGA GTGGGACGTT CCCCATTAAC GCCTACACCA 6540 

CGGGCCCCTG TACTCCCCTT CCTGCGCCGA ACTATAAGTT CGCGCTGTGG AGGGTGTCTG 6600 

CAGAGGAATA CGTGGAGATA AGGCGGGTGG GGGACTTCCA CTACGTATCG GGTATGACTA 6660 

CTGACAATCT TAAATGCCCG TGCCAGATCC CATCGCCCGA ATTTTTCACA GAATTGGACG 672 0 

GGGTGCGCCT ACATAGGTTT GCGCCCCCTT GCAAGCCCTT GCTGCGGGAG GAGGTATCAT 67 8 0 

TCAGAGTAGG ACTCCACGAG TACCCGGTGG GGTCGCAATT ACCTTGCGAG CCCGAACCGG 6840 

ACGTAGCCGT GTTGACGTCC ATGCTCACTG ATCCCTCCCA TATAACAGCA GAGGCGGCCG 6900 

GGAGAAGGTT GGCGAGAGGG TCACCCCCTT CTATGGCCAG CTCCTCGGCC AGCCAGCTGT 6960 

CCGCTCCATC TCTCAAGGCA ACTTGCACCG CCAACCATGA CTCCCCTGAC GCCGAGCTCA 702 0 

TAGAGGCTAA CCTCCTGTGG AGGCAGGAGA TGGGCGGCAA CATCACCAGG GTTGAGTCAG 7080 

AGAACAAAGT GGTGATTCTG GACTCCTTCG ATCCGCTTGT GGCAGAGGAG GATGAGCGGG 714 0 

AGGTCTCCGT ACCCGCAGAA ATTCTGCGGA AGTCTCGGAG ATTCGCCCGG GCCCTGCCCG 7200 

TTTGGGCGCG GCCGGACTAC AACCCCCCGC TAGTAGAGAC GTGGAAAAAG CCTGACTACG 72 60 

AACCACCTGT GGTCCATGGC TGCCCGCTAC CACCTCCACG GTCCCCTCCT GTGCCTCCGC 7320 

CTCGGAAAAA GCGTACGGTG GTCCTCACCG AATCAACCCT ACCTACTGCC TTGGCCGAGC 7380 
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TTGCCACCAA AAGTTTTGGC AGCTCCTCAA CTTCCGGCAT TACGGGCGAC AATATGACAA 


7440 


CATCCTCTGA GCCCGCCCCT TCTGGCTGCC 


: CCCCCGACTC 


; CGACGTTGAG TCCTATTCTT 


7500 


CCATGCCCCC 


CCTGGAGGGG GAGCCTGGGG 


' ATCCGGATTT 


' CAGCGACGGG 


\ TPATf^f^TPrin 


7560 


CGGTCAGTAG 


TGGGGCCGAC 


ACGGAAGATG 


TCGTGTGCTG 


CTCAATGTCT 


' TAT A PPT fZ/Tin 


7620 


CAGGCGCACT 


CGTCACCCCG 


TGCGCTGCGG 


AAGAACAAAA 


ACT GCC CATC 


AAPriP APT^zn 


7680 


GCAACTCGTT 


GCTACGCCAT 


CACAATCTGG 


TATATTCCAC 


CACTTCACGP 


A CZTP rz P TT f^cr' 
r\yj X X X GGG 


7740 


AAAGGCAGAA 


GAAAGTCACA 


TTTGACAGAC 


TGCAAGTTCT 


GGAPAr;PP AT 


X >\oG>iGG>\GG 


7800 


TGCTCAAGGA 


GGT CAAAGC A 


GCGGCGTCAA 


AAGT GAAG G C 


T* A A P "PT r:: p T zi 


I GGGTAGAGG 


7860 


AAGCTTGCAG 


CCTGACGCCC 


«— ' » — V^v^./~\. 1. 1 y^J^Kxj 


CCAAATPPAA 


VJ 1 i i (jjCjC X AT 


GGGGCAAAAG 


7920 


ACGTCCGTTG 




>\M.or O i JAij 


PPPAPAT'PnA 


CTCCGTGTGG 


AAAGACCTTC 


7980 


TGGAAGACAG 


T GTZVAr* A r* p n 


ATAf^AP'Iir'T'Zi 


TPATPAT'PPr^ 


C AAGAAC GAG 


GTCTTCTGCG 


8040 


TTCAGCCTGA 






P nn f^TT'T' O 7\ T* 


v-GTGTTCCCC 


GACCTGGGCG 


8100 


TGCGCGTGTG 




. wV^V^L- 1 O i iH.V_, O 


X X XJ-\.\J 


CAAACTCCCC 


CTGGCCGTGA 


8160 


TGGGAAGCTC 


P T A PHf^ n T T P 


^'-'-rtZ-i i ML. X \^J\\^ 


p AnrzAPApp^^ 


CjGTT gaatt c 


C T C GT GC AAG 


8220 


CGTGGAAGTC 


CAAGAAGA C C 


\^y^\3j-\x v^\:r\jKj X 


TPPPf^TATf^A 


i acccgctgt 


TTTGACTCCA 


8280 


CAGTCACTGA 


GAGC GACATC 


K^yj X yj\jjr\.\SKy 


A nf^P A A T'T T A 
f\\yys^j-\n. X X XJ^ 


1- C>vM. i G i i GT 


GAC CT GGAC C 


8340 


CCCAAGCCCG 


CGTGGCCATC 


AAGTCPCTCA 


PTf^AriArif^PT 


i Gl 1 GGG 


gGCCCTCTTA 


8400 


CCAATTCAAG 


GGGGGAAAAC 


TGCGGCTATP 


GPAf^nTf^ppn 


^ L- G/\ G • G G 


Gl ACTGACAA 


8460 


CTAGCTGTGG 


T AAC AC C C T C 


ACTTGCTACA 


TCAAHf^PPPr; 


U.>\G G G 1 


C GAGC C G C AG 


8520 


GGCTCCAGGA 


CT G CAC CAT G 


CTCGTGTGTG 


GCGAPf^APTT 




J. G 1 GAAAGTG 


8580 


CGGGGGTCCA 


GGAGGAC GC G 


G C GAGC CT GA 


GAGCCTTTAC 


f^rSAf^PPT ATr" 


AG GAGG X ACT 


8640 


CCGCCCCCCC 


CGGGGACCCC 


C CAC AAC CAG 


AAT AC GACTT 


GGAf^PTT ATA 


APATP7\*TT* f^T* 
M,v^/\x GAX GG 1 


8700 


CCTCCAACGT 


GTCAGTCGCC 


CACGACGGCG 


CTGGAAAAAG 


GGTPTAPTAP 


PTT APPPf^Tf^ 


8760 


ACCCTACAAC 


CCCCCTCGCG 


AGAGCCGCGT 


GGGAGACAGC 


AAGAC AC AC T 


PPAf^TPA ATT 


8820 


CCTGGCTAGG 


C AAC AT AAT C 


ATGTTTGCCC 


CCACACTGTG 


GGCGAGGATG 


ATA C T GAT G A 


8880 




TAGCGTCCTC 


ATAGCCAGGG 


ATCAGCTTGA 


ACAGGCTCTT 


AACTGTGAGA 


8940 


TCTACGCAGC 


CTGCTACTCC 


ATAGAACCAC 


TGGATCTACC 


TCCAATCATT 


CAAAGACTCC 


9000 


ATGGCCTCAG 


CGCATTTTTA 


CTCCACAGTT 


ACTCTCCAGG 


TGAAGTCAAT 


AGGGTGGCCG 


9060 


CATGCCTCAG 


AAAACTTGGG 


GTCCCGCCCT 


TGCGAGCTTG 


GAGACACCGG 


GCCCGGAGCG 


9120 


TCCGCGCTAG 


GCTTCTGTCC 


AGGGGAGGCA 


GGGCTGCCAT 


ATGTGGCAAG 


TACCTCTTCA 


9180 


ACTGGGCAGT . 


AAGAACAAAG 


CTCAAACTCA 


CTCCAATAGC 


GGCCGCTGGC 


CGGCTGGACT 


9240 


TGTCCGGTTG 


GTTCACGGCT 


GGCTACAGCG 


GGGGAGACAT 


TTATCACAGC 


GTGTCTCATG 


9300 
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CCCGGCCCCG CTGGTTCTGG TTTTGCCTAC TCCTGCTCGC TGCAGGGGTA GGCATCTACC 9 360 

TCCTCCCCAA CCGGTGAACG GGGAGCTAGA CACTCCGGCC T 9401 
(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CH7VRACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "oligonucleotide primer" 

(iii) HYPOTHETICAL: NO 

(iv) ANTI -SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO ; 2 : 
GGACTAGTCT GCAGTCTAGA GCTCCATGGC GCCCATCACG GCGTACG 4 7 

(2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "oligonucleotide primer" 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 
CGGCTGCACC TCCAGCAGTG CATTTTAGAT CTTAAGTCTA GAAG 44 
(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "oligonucleotide primer" 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 
AAGCTACTCT ACCTTCTCAC GATTTTAGAT CTTAAGTCTA GAAG 4 4 

(2) INFORMATION FOR SEQ ID NO : 5 : 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "oligonucleotide primer*' 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 
GGATCTCTGT TGGTACTCTA GGATTTTAGA TCTTAAGTCT AGAAG 
(2) INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 64 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "OLIGONUCLEOTIDE DUPLEX" 

(iii) H Y POTHET I CAL : NO 
(iv) ANTI-SENSE: NO 
(V) FRAGMENT TYPE: internal 

(ix) FEATURE: 

(A) NAME/KEY: misc_f ea ture 

(B) LOCATION: 1..4 

(D) OTHER INFORMATION: /product= "SINGLE STRANDED REGION 
ON CODING STRAND" 

( ix ) FEATURE : 

(A) NAME/KEY: misc_f ea ture 

(B) LOCATION: 61 . . 64 

(D) OTHER INFORMATION: /product= "SINGLE STRANDED REGION 
ON COMPLEMENTARY STRAND" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO :-6 : 

TGCACGGCGC CGACACGGAA GATGTCGTGT GCTGCTCAAT GTCTTATACC TGGACAGGCG 

TGCA 

(2) INFORMATION FOR SEQ ID NO; 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(iii) H Y POTHET I CAL : NO 



60 
64 
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(v) FRAGMENT TYPE: internal 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 

Gly Ala Asp Thr Glu Asp Val Val Cys Cys Ser Met Ser Tyr Thr Trp 
15 10 15 

Thr Gly Val His 
20 

(2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "oligonucleotide primer" 

(iii) HYPOTHETICAL: NO 

(iv) TU^ITI-SENSE : NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 
GGACTAGTCT GCAGTCTAGA GCTCCATGGC GCCCATCACG GCGTACG 47 
(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Other nucleic acid 

(A) DESCRIPTION: /desc = "oligonucleotide primer" 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
CGGGAGCCGG AGGACGTCTG GCGCAGG 27 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1497 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 



SUBSTITUTE SHEET (RULE 26) 



wo 96/34976 



PCT/US96/06070 



-29- 



(ix) FEATURE: 

(A) NAME/KEY: CDS. 

(B) LOCATION: 87., 893 



(ix) FEATURE: 

(A) NAME/KEY: niisc_f ea ture 

(B) LOCATION: 426.. 427 

(D) OTHER INFORMATION: /label= ApaLIsite 



(Xi) SEQUENCE DESCRIPTION: SEQ ID' NO: 10: 



ACCAACCTCT TCGAGGCACA AGGCACAACA GGCTGCTCTG GGATTCTCTT CAGCCAATCT 60 

TCATTGCTCA AGTGTCTGAA GCAGCC ATG GCA GAA GTA CCT GAG CTC GCC AGT 113 

Met Ala Glu Val Pro Glu Leu Ala Ser 
1 5 



GAA ATG ATG 
Glu Met Met 
10 

GCT GAT GGC 
Ala Asp Gly 



TGC CCT CTG 
Cys Pro Leu 



AGC AAG GGC 
Ser Lys Gly 
60 

CTG AGG AAG 
Leu Arg Lys 
75 

CTG AGC ACC 
Leu Ser Thr 
90 

GAC ACA TGG 
Asp Thr Trp 



CTG AAC TGC 
Leu TVsn Cys 



GCT TAT TAC 
Ala Tyr Tyr 
15 

CCT AAA CAG 
Pro Lys Gin 
30 

GAT GGC GGC 
Asp Gly Gly 
45 

TTC AGG CAG 
Phe Arg Gin 



ATG CTG GTT 
Met Leu Val 



TTC TTT CCC 
Phe Phe Pro 
95 

GAT AAC GAG 
Asp Asn Glu 
110 

ACG CTC CGG 
Thr Leu Arg 
125 



AGT GGC AAT 
Ser Gly Asn 



ATG AAG TGC 
Met Lys Cys 



ATC CAG CTA 
lie Gin Leu 
50 

GCC GCG TCA 
Ala Ala Ser 
65 

CCC TGC CCA 
Pro Cys Pro 
80 

TTC ATC TTT 
Phe lie Phe 



GCT TAT GTG 
Ala Tyr Val 



GAC TCA CAG 
Asp Ser Gin 
130 



GAG GAT GAC 
Glu Asp Asp 
20 

TCC TTC CAG 
Ser Phe Gin 
35 

CGA ATC TCC 
Arg lie Ser 



GTT GTT GTG 
Val Val Val 



CAG ACC TTC 
Gin Thr Phe 
85 

GAA GAA GAA 
Glu Glu Glu 
100 

CAC GAT GCA 
His Asp Ala 
115 

CAA AAA AGC 
Gin Lys Ser 



TTG TTC TTT 
Leu Phe Phe 



GAC CTG GAC 
Asp Leu Asp 
40 

GAC CAC CAC 
Asp His His 
55 

GCC ATG GAC 
Ala Met Asp 
70 

CAG GAG AAT 
Gin Glu Asn 



CCT ATC TTC 
Pro lie Phe 



CCT GTA CGA 
Pro Val Arg 
120 

TTG GTG ATG 
Leu Val Met 
135 



GAA 161 
Glu 
25 

CTC 209 
Leu 



TAC 257 
Tyr 



AAG 305 
Lys 



GAC 353 
Asp 



TTC 401 

Phe 

105 

TCA 449 
Ser 



TCT 4 97 

Ser 



GGT CCA TAT GAA CTG AAA GCT CTC CAC CTC CAG GGA CAG GAT ATG GAG 54 5 

Gly Pro Tyr Glu Leu Lys Ala Leu His Leu Gin Gly Gin Asp Met Glu 
140 • 145 150 

CAA CAA GTG GTG TTC TCC ATG TCC TTT GTA CAA GGA GAA GT^A AGT AAT 5 93 

Gin Gin Val Val Phe Ser Met Ser Phe Val Gin Gly Glu Glu Ser Asn 

155 160 165 

GAC AAA ATA CCT GTG GCC TTG GGC CTC AAG GAA AAG AAT CTG TAC CTG 641 

Asp Lys lie Pro Val Ala Leu Gly Leu Lys Glu Lys Asn Leu Tyr Leu 
170 175 180 185 



TCC TGC GTG TTG AAA GAT GAT AAG CCC ACT CTA CAG CTG GAG AGT GTA 68 9 
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Ser Cys Val Leu Lys Asp Asp Lys Pro Thr Leu Gin Leu Glu Ser Vai 
190 195 200 

GAT CCC AAA AAT TAG CCA AAG AAG AAG ATG GAA AAG CGA TTT GTC TTC 737 
Asp Pro Lys Asn Tyr Pro Lys Lys Lys Met Glu Lys Arg Phe Val Phe 
205 210 215 

AAC AAG ATA GAA ATC AAT AAC AAG CTG GAA TTT GAG TCT GCC CAG TTC 7 85 

Asn Lys lie Glu lie Asn Asn Lys Leu Glu Phe Glu Ser Ala Gin Phe 
220 225 230 

CCC AAC TGG TAC ATC AGC ACC TCT CAA GCA GAA AAC ATG CCC GTC TTC 833 
Pro Asn Trp Tyr lie Ser Thr Ser Gin Ala Glu Asn Met Pro Val Phe 
235 240 245 

CTG GGA GGG ACC AAA GGC GGC CAG GAT ATA ACT GAC TTC ACC ATG CAA 8 81 

Leu Gly Gly Thr Lys Gly Gly Gin Asp He Thr Asp Phe Thr Met Gin 
250 255 260 265 

TTT GTG TCT TCC TAAAGAGAGC TGTACCCAGA GAGTCCTGTG CTGAATGTGG 933 
Phe Val Ser Ser 

ACTCAATCCC TAGGGCTGGC AGAAAGGGAA CAGAAAGGTT TTTGAGTACG GCTATAGCCT 9 93 

GGACTTTCCT GTTGTCTACA CCAATGCCCA ACTGCCTGCC TTAGGGTAGT GCTAAGAGGA 1053 

TCTCCTGTCC ATCAGCCAGG ACAGTCAGCT CTCTCCTTTC AGGGCCAATC CCCAGCCCTT 1113 

TTGTTGAGCC AGGCCTCTCT CACCTCTCCT ACTCACTTAA AGCCCGCCTG ACAGAAACCA 1173 

CGGCCACATT TGGTTCTAAG AAACCCTCTG TCATTCGCTC CCACATTCTG ATGAGCAACC 1233 

GCTTCCCTAT TTATTTATTT ATTTGTTTGT TTGTTTTATT CATTGGTCTA ATTTATTCAA 12 93 

AGGGGGCAAG AAGTAGCAGT GTCTGTAAAA GAGCCTAGTT TTTAATAGCT ATGGAATCAA 1353 

TTCAATTTGG ACTGGTGTGC TCTCTTTAAA TCAAGTCCTT TAATTAAGAC TGAAAATATA 1413 

TAAGCTCAGA TTATTTAAAT GGGAATATTT ATAAATGAGC AAATATCATA CTGTTCAATG 147 3 

GTTCTGAAAT AAACTTCTCT GAAG 14 97 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 69 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Met Ala Glu Val Pro Glu Leu Ala Ser Glu Met Met Ala Tyr Tyr Ser 
15 10 15 

Gly Asn Glu Asp Asp Leu Phe Phe Glu Ala Asp Gly Pro Lys Gin Met 
20 25 30 

Lys Cys Ser Phe Gin Asp Leu Asp Leu Cys Pro Leu Asp Gly Gly He 
35 40 45 
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Gln Leu Arg He Ser Asp His His Tyr Ser Lys Gly Phe Arg Gin Ala 
50 55 60 

Ala Ser Val Val Val Ala Met Asp Lys Leu Arg Lys Met Leu Val Pro 
65 70 75 80 

Cys Pro Gin Thr Phe Gin Glu Asn Asp Leu Ser Thr Phe Phe Pro Phe 

85 90 95 

He Phe Glu Glu Glu Pro lie Phe Phe Asp Thr Trp Asp Asn Glu Ala 
100 105 1,10 

Tyr Val His Asp Ala Pro Val Arg Ser Leu Asn Cys Thr Leu Arg Asp 
115 120 125 

Ser Gin Gin Lys Ser Leu Val Met Ser Gly Pro Tyr Glu Leu Lys Ala 
130 135 140 

Leu His Leu Gin Gly Gin Asp Met Glu Gin Gin Val Val Phe Ser Met 
I'lS 150 155 160 

Ser Phe Val Gin Gly Glu Glu Ser Asn Asp Lys He Pro Val Ala Leu 
165 170 175 

Gly Leu Lys Glu Lys Asn Leu Tyr Leu Ser Cys Val Leu Lys Asp Asp 
180 185 190 

Lys Pro Thr Leu Gin Leu Glu Ser Val Asp Pro Lys Asn Tyr Pro Lys 
195 200 205 

Lys Lys Met Glu Lys Arg Phe Val Phe Asn Lys He Glu He Asn Asn 
210 215 220 

Lys Leu Glu Phe Glu Ser Ala Gin Phe Pro Asn Trp Tyr He Ser Thr 
225 230 235 240 

Ser Gin Ala Glu Asn Met Pro Val Phe Leu Gly Gly Thr Lys Gly Gly 
245 250 255 

Gin Asp He Thr Asp Phe Thr Met Gin Phe Val Ser Ser 
260 265 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "oligonucleotide primer" 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 12 : 
CTCGGCCTCC TGCAGGCACC TGTACGATCA CTGAAC 3e 
(2) INFORMATION FOR SEQ ID NO: 13: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS; single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; other nucleic acid 

(A) DESCRIPTION: /desc = "oligonucleotide primer" 

(iii) HYPOTHETICAL: NO 

Uv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
GTTAAACACA GAAGGATTTT AGATCTTAAG GG 
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CLAIMS 



I claim: 



1. A method for assaying exogenous protease 
activity in a host cell comprising the steps of: 

(a) incubating/a host cell transformed with a 
first nucleotide sequence encoding an exogenous protease and a 
second nucleotide sequence encoding an artificial polypeptide 
substrate; 

wherein said substrate comprises: 

(i) a cleavage site for said exogenous 

protease; and 

(ii) a polypeptide that is secreted out of 
said cell following cleavage by said exogenous protease; 

under conditions which cause said exogenous protease and said 
artificial substrate to be expressed; 

(b) separating said host cell from its growth 
media under non-lytic conditions; and 

(c) assaying said growth media for the 
presence of said secreted polypeptide. 

2. A method for assaying endogenous protease 
activity in a host cell comprising the steps of: 

(a) incubating a host cell transformed with a 
nucleotide sequence encoding an artificial polypeptide 
substrate; 

wherein said substrate comprises: 

(i) a cleavage site for said endogenous 

protease; and 

(ii) a polypeptide that is secreted out of 
said cell following cleavage by said endogenous protease; 

under conditions which cause said artificial substrate to be 
expressed; 

(b) separating said host cell from its growth 
media under non-lytic conditions; and 

(c) assaying said growth media for the 
presence of said secreted polypeptide- 
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3. A method for identifying a compound as an 
inhibitor of a protease comprising the steps of: 

(a) assaying the activity of a protease in the 
absence of said compound by a method according to claim 1 or 

2 ; . 

(b) assaying the activity of a protease in the 
presence of said compound by a method according to claim 1 or 

2, wherein said compound is added to the host cells during 
said inciibation of said host cells; and 

(c) comparing the results of step (a) with the 
results of step (b) : 

4. The method according to claim 1 or claim 3, 
insofar as it depends from claim 1, wherein said first 
nucleotide sequence and said second nucleotide sequence encode 
a single polypeptide. 

5. The method according to claim 4, wherein said 
first and second nucleotide sequences encode NS3-4A-A4B-IL-lfi . 

6. The method according to any one of claims 1 to 

3, wherein said first nucleotide sequence encodes a viral 
protease or an enzymatically active fragment thereof . 

7. The method according to claim 6, wherein said 
first nucleotide sequence encodes hepatitis C virus NS3 
protease, an NS3-4A fusion protein or amino acids 1-180 of NS3 
protease . 

8. The method according to any one of claims 1 to 
3, wherein said secreted polypeptide is selected from 
polypeptides comprising mature IL-lfl, mature IL-la, basic 
fibroblast growth factor and endothelial-monocyte activating 
polypeptide II, 

9. The method according to claim 8, wherein said 
secreted polypeptide comprises mature IL-IB. 
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10. The method according to claim 9, wherein said 
artificial polypeptide- substrate is selected from pre-IL-15* 
or pre-IL-115 (CSM) . 

11. A host cell transformed with a nucleotide 
sequence encoding an artificial polypeptide substrate, wherein 

. said substrate comprises: 

(a) a cleavage site for said exogenous 

protease; and 

(b) a polypeptide that is secreted out of said 
cell following cleavage by said exogenous protease; 

said host cell being capable of expressing said protease and 
said substrate. 

12. A host cell transformed with a first nucleotide 
sequence encoding an exogenous protease and a second 
nucleotide sequence encoding an artificial polypeptide 
substrate, wherein said substrate comprises: 

(a) a cleavage site for said exogenous 

protease; and 

(b) a polypeptide that is secreted out of said 
cell following cleavage by said exogenous protease; 

said host cell being capable of expressing said protease and 
said substrate. 

13. The host cell according to claim 11 or 12, 
wherein said secreted polypeptide is selected from 
polypeptides comprising mature IL-IU, mature IL-la, basic 
fibroblast growth factor and endothelial-monocyte activating 
polypeptide II. 



14. The host cell according to claim 13, wherein 
said secreted polypeptide comprises mature IL-lfl. 
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15. The host cell according to claim 14, wherein 
said artificial polypeptide substrate is selected from pre-IL- 
Ifi* or pre-IL-lfi (CSM) . 

16. The host cell according to claim 12, wherein 
said first nucleotide sequence and said second nucleotide 
sequence encode a single polypeptide. 

17. The host cell according to claim 16, wherein 
said first and second nucleotide sequences encode NS3-4A-A4B- 
IL-IB. 

18 • The host cell according to claim 12, wherein 
said first nucleotide sequence encodes a viral protease or an 
enzymatically active fragment thereof. 

19. The host cell according to claim 18, wherein 
said first nucleotide sequence encodes hepatitis C virus NS3 
protease, an NS3-4A fusion protein or amino acids 1-180 of NS3 
protease . 

20. The host cell according to claim 11 or 12, 
selected from coli . Bacillus , other bacteria, yeast and 
other fungi, plant cells, insect cells, mammalian cells. 

21. The host cell according to claim 20, wherein 
said host cell is a mammalian cell. 

22. The host cell according to claim 21, wherein 
said host cell is a COS cell. 



23. A recombinant DNA molecule comprising a DNA 
sequence encoding an artificial substrate selected from pre- 
IL-IB* and pre-IL-lB (CSM) . 
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